{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/preprocessing/pytorch/accelerate_pytorch_cv_data_pipeline.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accelerate Computer Vision Data Processing Pipeline"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use `transforms` and `datasets` from `bigdl.nano.pytorch.vision`, which takes advantage of OpenCV and libjpeg-turbo, as a drop-in replacement of `torchvision.transforms` and `torchvision.datasets` to easily accelerate your computer vision data processing pipeline in PyTorch or PyTorch Lightning applications."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "In order to achieve this, you need to install BigDL-Nano for PyTorch first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "!pip install --pre --upgrade bigdl-nano[pytorch] # install the nightly-built version\n",
    "!source bigdl-nano-init # set environment variables"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that you would like to preprocess [OxfordIIITPet dataset](https://pytorch.org/vision/main/generated/torchvision.datasets.OxfordIIITPet.html). To accelerate this preocess, you could simply **import BigDL-Nano** `transforms` **and** `datasets` **to replace** `torchvision.transforms` **and** `torchvision.datasets`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from torchvision import transforms\n",
    "# from torchvision.datasets import OxfordIIITPet\n",
    "from bigdl.nano.pytorch.vision import transforms\n",
    "from bigdl.nano.pytorch.vision.datasets import OxfordIIITPet\n",
    "\n",
    "# Data processing steps are the same as using torchvision\n",
    "train_transform = transforms.Compose([transforms.Resize(256),\n",
    "                                        transforms.RandomCrop(224),\n",
    "                                        transforms.RandomHorizontalFlip(),\n",
    "                                        transforms.ColorJitter(brightness=.5, hue=.3),\n",
    "                                        transforms.ToTensor(),\n",
    "                                        transforms.Normalize([0.485, 0.456, 0.406],\n",
    "                                                            [0.229, 0.224, 0.225])])\n",
    "val_transform = transforms.Compose([transforms.Resize(256),\n",
    "                                    transforms.CenterCrop(224),\n",
    "                                    transforms.ToTensor(),\n",
    "                                    transforms.Normalize([0.485, 0.456, 0.406],\n",
    "                                                            [0.229, 0.224, 0.225])])\n",
    "\n",
    "train_dataset = OxfordIIITPet(root=\"/tmp/data\", transform=train_transform, download=True)\n",
    "val_dataset = OxfordIIITPet(root=\"/tmp/data\", transform=val_transform)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You could then create train/validate dataloader as normal:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# obtain training indices that will be used for validation\n",
    "import torch\n",
    "\n",
    "indices = torch.randperm(len(train_dataset))\n",
    "val_size = len(train_dataset) // 4\n",
    "train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])\n",
    "val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])\n",
    "\n",
    "# create dataloaders\n",
    "from torch.utils.data.dataloader import DataLoader\n",
    "\n",
    "train_dataloader = DataLoader(train_dataset, batch_size=32)\n",
    "val_dataloader = DataLoader(val_dataset, batch_size=32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📚 **Related Readings**\n",
    "> \n",
    "> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "nano-tf",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.7.13 (default, Mar 29 2022, 02:18:16) \n[GCC 7.5.0]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "402532f56d486e9f832908f31130bbdf12bd8cb099dfb226783aa2c6b1479100"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
